Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 37307 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 4.0 MiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 12 |
|---|---|
| Unsupported | 1 |
| Categorical | 1 |
views is highly correlated with likes and 1 other fields | High correlation |
likes is highly correlated with views and 2 other fields | High correlation |
dislikes is highly correlated with likes and 1 other fields | High correlation |
comment_count is highly correlated with views and 2 other fields | High correlation |
views is highly correlated with likes and 2 other fields | High correlation |
likes is highly correlated with views and 2 other fields | High correlation |
dislikes is highly correlated with views and 2 other fields | High correlation |
comment_count is highly correlated with views and 2 other fields | High correlation |
Ratio_View_likes is highly correlated with Ratio_views_comment_count and 1 other fields | High correlation |
Ratio_View_dislikes is highly correlated with Ratio_likes_dislikes | High correlation |
Ratio_views_comment_count is highly correlated with Ratio_View_likes | High correlation |
Ratio_likes_dislikes is highly correlated with Ratio_View_likes and 1 other fields | High correlation |
views is highly correlated with likes and 2 other fields | High correlation |
likes is highly correlated with views and 2 other fields | High correlation |
dislikes is highly correlated with views and 2 other fields | High correlation |
comment_count is highly correlated with views and 2 other fields | High correlation |
Ratio_View_likes is highly correlated with Ratio_View_dislikes | High correlation |
Ratio_View_dislikes is highly correlated with Ratio_View_likes | High correlation |
views is highly correlated with comment_count and 1 other fields | High correlation |
comment_count is highly correlated with views and 2 other fields | High correlation |
likes is highly correlated with views and 2 other fields | High correlation |
dislikes is highly correlated with comment_count and 1 other fields | High correlation |
dislikes is highly skewed (γ1 = 28.43545183) | Skewed |
Ratio_View_likes is highly skewed (γ1 = 54.7018187) | Skewed |
Ratio_views_comment_count is highly skewed (γ1 = 74.25807043) | Skewed |
df_index has unique values | Unique |
publish_time_1 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
| Analysis started | 2021-09-02 10:38:57.361995 |
|---|---|
| Analysis finished | 2021-09-02 10:40:13.405337 |
| Duration | 1 minute and 16.04 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 37307 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19382.94486 |
| Minimum | 0 |
|---|---|
| Maximum | 38915 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1920.3 |
| Q1 | 9707.5 |
| median | 19362 |
| Q3 | 29002 |
| 95-th percentile | 36983.7 |
| Maximum | 38915 |
| Range | 38915 |
| Interquartile range (IQR) | 19294.5 |
Descriptive statistics
| Standard deviation | 11209.38901 |
|---|---|
| Coefficient of variation (CV) | 0.5783119692 |
| Kurtosis | -1.188597382 |
| Mean | 19382.94486 |
| Median Absolute Deviation (MAD) | 9648 |
| Skewness | 0.006894843904 |
| Sum | 723119524 |
| Variance | 125650402 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 25761 | 1 | < 0.1% |
| 25781 | 1 | < 0.1% |
| 25782 | 1 | < 0.1% |
| 25783 | 1 | < 0.1% |
| 25784 | 1 | < 0.1% |
| 25785 | 1 | < 0.1% |
| 25786 | 1 | < 0.1% |
| 25787 | 1 | < 0.1% |
| 25788 | 1 | < 0.1% |
| Other values (37297) | 37297 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 38915 | 1 | |
| 38914 | 1 | |
| 38913 | 1 | |
| 38912 | 1 | |
| 38911 | 1 | |
| 38910 | 1 | |
| 38909 | 1 | |
| 38908 | 1 | |
| 38907 | 1 | |
| 38906 | 1 |
category_id
Real number (ℝ≥0)
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.73945908 |
| Minimum | 1 |
|---|---|
| Maximum | 43 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 10 |
| median | 20 |
| Q3 | 24 |
| 95-th percentile | 26 |
| Maximum | 43 |
| Range | 42 |
| Interquartile range (IQR) | 14 |
Descriptive statistics
| Standard deviation | 7.724900544 |
|---|---|
| Coefficient of variation (CV) | 0.4614785045 |
| Kurtosis | -1.095539639 |
| Mean | 16.73945908 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | -0.3524889223 |
| Sum | 624499 |
| Variance | 59.67408841 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=16)
| Value | Count | Frequency (%) |
| 10 | 13537 | |
| 24 | 8626 | |
| 22 | 2738 | 7.3% |
| 1 | 2408 | 6.5% |
| 26 | 1921 | 5.1% |
| 23 | 1794 | 4.8% |
| 17 | 1784 | 4.8% |
| 20 | 1672 | 4.5% |
| 25 | 1124 | 3.0% |
| 15 | 527 | 1.4% |
| Other values (6) | 1176 | 3.2% |
| Value | Count | Frequency (%) |
| 1 | 2408 | 6.5% |
| 2 | 136 | 0.4% |
| 10 | 13537 | |
| 15 | 527 | 1.4% |
| 17 | 1784 | 4.8% |
| 19 | 89 | 0.2% |
| 20 | 1672 | 4.5% |
| 22 | 2738 | 7.3% |
| 23 | 1794 | 4.8% |
| 24 | 8626 |
| Value | Count | Frequency (%) |
| 43 | 20 | 0.1% |
| 29 | 38 | 0.1% |
| 28 | 444 | 1.2% |
| 27 | 449 | 1.2% |
| 26 | 1921 | 5.1% |
| 25 | 1124 | 3.0% |
| 24 | 8626 | |
| 23 | 1794 | 4.8% |
| 22 | 2738 | 7.3% |
| 20 | 1672 | 4.5% |
| Distinct | 36984 |
|---|---|
| Distinct (%) | 99.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6004065.957 |
| Minimum | 1014 |
|---|---|
| Maximum | 424538912 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 1014 |
|---|---|
| 5-th percentile | 48527.1 |
| Q1 | 255845 |
| median | 995938 |
| Q3 | 3750771 |
| 95-th percentile | 24757060.4 |
| Maximum | 424538912 |
| Range | 424537898 |
| Interquartile range (IQR) | 3494926 |
Descriptive statistics
| Standard deviation | 19270692.81 |
|---|---|
| Coefficient of variation (CV) | 3.209607113 |
| Kurtosis | 113.42635 |
| Mean | 6004065.957 |
| Median Absolute Deviation (MAD) | 891506 |
| Skewness | 9.00864677 |
| Sum | 2.239936887 × 1011 |
| Variance | 3.713596012 × 1014 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 191321 | 5 | < 0.1% |
| 18506 | 3 | < 0.1% |
| 191327 | 3 | < 0.1% |
| 46059 | 3 | < 0.1% |
| 53685 | 2 | < 0.1% |
| 83448 | 2 | < 0.1% |
| 42150 | 2 | < 0.1% |
| 48308 | 2 | < 0.1% |
| 3490561 | 2 | < 0.1% |
| 630614 | 2 | < 0.1% |
| Other values (36974) | 37281 |
| Value | Count | Frequency (%) |
| 1014 | 1 | |
| 1505 | 1 | |
| 1540 | 1 | |
| 1559 | 1 | |
| 1566 | 1 | |
| 1571 | 1 | |
| 1577 | 1 | |
| 1581 | 1 | |
| 1583 | 1 | |
| 1858 | 1 |
| Value | Count | Frequency (%) |
| 424538912 | 1 | |
| 413586699 | 1 | |
| 402650804 | 1 | |
| 392036878 | 1 | |
| 382401497 | 1 | |
| 372399338 | 1 | |
| 362111555 | 1 | |
| 349987176 | 1 | |
| 339629489 | 1 | |
| 337621571 | 1 |
| Distinct | 29986 |
|---|---|
| Distinct (%) | 80.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 138137.6737 |
| Minimum | 5 |
|---|---|
| Maximum | 5613827 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 885 |
| Q1 | 6362 |
| median | 26359 |
| Q3 | 119267.5 |
| 95-th percentile | 578678 |
| Maximum | 5613827 |
| Range | 5613822 |
| Interquartile range (IQR) | 112905.5 |
Descriptive statistics
| Standard deviation | 354840.2362 |
|---|---|
| Coefficient of variation (CV) | 2.56874339 |
| Kurtosis | 64.96627895 |
| Mean | 138137.6737 |
| Median Absolute Deviation (MAD) | 24282 |
| Skewness | 6.790122609 |
| Sum | 5153502193 |
| Variance | 1.259115933 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 35 | 15 | < 0.1% |
| 403 | 13 | < 0.1% |
| 22 | 12 | < 0.1% |
| 53 | 11 | < 0.1% |
| 321 | 10 | < 0.1% |
| 3056 | 9 | < 0.1% |
| 333 | 9 | < 0.1% |
| 828 | 9 | < 0.1% |
| 525 | 9 | < 0.1% |
| 290 | 9 | < 0.1% |
| Other values (29976) | 37201 |
| Value | Count | Frequency (%) |
| 5 | 3 | < 0.1% |
| 13 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
| 18 | 3 | < 0.1% |
| 19 | 2 | < 0.1% |
| 20 | 5 | |
| 22 | 12 | |
| 29 | 4 | < 0.1% |
| 30 | 5 | |
| 31 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 5613827 | 1 | |
| 5595203 | 1 | |
| 5530568 | 1 | |
| 5486349 | 1 | |
| 5444541 | 1 | |
| 5439015 | 1 | |
| 5426274 | 1 | |
| 5386959 | 1 | |
| 5366150 | 1 | |
| 5329161 | 1 |
dislikes
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 10919 |
|---|---|
| Distinct (%) | 29.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7192.118342 |
| Minimum | 1 |
|---|---|
| Maximum | 1753274 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 34 |
| Q1 | 213 |
| median | 842 |
| Q3 | 3422 |
| 95-th percentile | 27423.5 |
| Maximum | 1753274 |
| Range | 1753273 |
| Interquartile range (IQR) | 3209 |
Descriptive statistics
| Standard deviation | 41230.31061 |
|---|---|
| Coefficient of variation (CV) | 5.732707479 |
| Kurtosis | 1072.178273 |
| Mean | 7192.118342 |
| Median Absolute Deviation (MAD) | 762 |
| Skewness | 28.43545183 |
| Sum | 268316359 |
| Variance | 1699938513 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 15 | 104 | 0.3% |
| 27 | 84 | 0.2% |
| 22 | 82 | 0.2% |
| 17 | 79 | 0.2% |
| 11 | 77 | 0.2% |
| 79 | 77 | 0.2% |
| 77 | 76 | 0.2% |
| 12 | 76 | 0.2% |
| 3 | 76 | 0.2% |
| 44 | 75 | 0.2% |
| Other values (10909) | 36501 |
| Value | Count | Frequency (%) |
| 1 | 34 | |
| 2 | 66 | |
| 3 | 76 | |
| 4 | 46 | |
| 5 | 15 | < 0.1% |
| 6 | 31 | |
| 7 | 59 | |
| 8 | 64 | |
| 9 | 33 | |
| 10 | 48 |
| Value | Count | Frequency (%) |
| 1753274 | 1 | |
| 1739579 | 1 | |
| 1732859 | 1 | |
| 1727826 | 1 | |
| 1722307 | 1 | |
| 1712284 | 1 | |
| 1704861 | 1 | |
| 1694945 | 1 | |
| 1683321 | 1 | |
| 1668460 | 1 |
| Distinct | 15654 |
|---|---|
| Distinct (%) | 42.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12976.33608 |
| Minimum | 1 |
|---|---|
| Maximum | 1228655 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 137.3 |
| Q1 | 759.5 |
| median | 2653 |
| Q3 | 9589.5 |
| 95-th percentile | 51506.1 |
| Maximum | 1228655 |
| Range | 1228654 |
| Interquartile range (IQR) | 8830 |
Descriptive statistics
| Standard deviation | 44236.03692 |
|---|---|
| Coefficient of variation (CV) | 3.408977438 |
| Kurtosis | 240.1136368 |
| Mean | 12976.33608 |
| Median Absolute Deviation (MAD) | 2307 |
| Skewness | 12.71974664 |
| Sum | 484108170 |
| Variance | 1956826962 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 201 | 43 | 0.1% |
| 50 | 39 | 0.1% |
| 40 | 38 | 0.1% |
| 53 | 36 | 0.1% |
| 37 | 34 | 0.1% |
| 30 | 33 | 0.1% |
| 257 | 30 | 0.1% |
| 346 | 29 | 0.1% |
| 271 | 28 | 0.1% |
| 316 | 28 | 0.1% |
| Other values (15644) | 36969 |
| Value | Count | Frequency (%) |
| 1 | 3 | < 0.1% |
| 2 | 2 | < 0.1% |
| 3 | 7 | < 0.1% |
| 4 | 17 | |
| 5 | 9 | < 0.1% |
| 6 | 8 | < 0.1% |
| 7 | 28 | |
| 8 | 2 | < 0.1% |
| 9 | 9 | < 0.1% |
| 10 | 12 |
| Value | Count | Frequency (%) |
| 1228655 | 1 | |
| 1225326 | 1 | |
| 1213172 | 1 | |
| 1204867 | 1 | |
| 1197130 | 1 | |
| 1189456 | 1 | |
| 1165350 | 1 | |
| 1163977 | 1 | |
| 1142269 | 1 | |
| 1114800 | 1 |
publish_weekday
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 291.6 KiB |
| Friday | |
|---|---|
| Wednesday | |
| Thursday | |
| Tuesday | |
| Monday | |
| Other values (2) |
Length
| Max length | 9 |
|---|---|
| Median length | 7 |
| Mean length | 7.2012759 |
| Min length | 6 |
Characters and Unicode
| Total characters | 268658 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Friday |
|---|---|
| 2nd row | Sunday |
| 3rd row | Friday |
| 4th row | Monday |
| 5th row | Monday |
Common Values
| Value | Count | Frequency (%) |
| Friday | 7407 | |
| Wednesday | 7088 | |
| Thursday | 6886 | |
| Tuesday | 5622 | |
| Monday | 5614 | |
| Sunday | 2611 | 7.0% |
| Saturday | 2079 | 5.6% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| friday | 7407 | |
| wednesday | 7088 | |
| thursday | 6886 | |
| tuesday | 5622 | |
| monday | 5614 | |
| sunday | 2611 | 7.0% |
| saturday | 2079 | 5.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| d | 44395 | |
| a | 39386 | |
| y | 37307 | |
| e | 19798 | |
| s | 19596 | |
| u | 17198 | 6.4% |
| r | 16372 | 6.1% |
| n | 15313 | 5.7% |
| T | 12508 | 4.7% |
| F | 7407 | 2.8% |
| Other values (7) | 39378 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 231351 | |
| Uppercase Letter | 37307 | 13.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| d | 44395 | |
| a | 39386 | |
| y | 37307 | |
| e | 19798 | |
| s | 19596 | |
| u | 17198 | 7.4% |
| r | 16372 | 7.1% |
| n | 15313 | 6.6% |
| i | 7407 | 3.2% |
| h | 6886 | 3.0% |
| Other values (2) | 7693 | 3.3% |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 12508 | |
| F | 7407 | |
| W | 7088 | |
| M | 5614 | |
| S | 4690 | 12.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 268658 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| d | 44395 | |
| a | 39386 | |
| y | 37307 | |
| e | 19798 | |
| s | 19596 | |
| u | 17198 | 6.4% |
| r | 16372 | 6.1% |
| n | 15313 | 5.7% |
| T | 12508 | 4.7% |
| F | 7407 | 2.8% |
| Other values (7) | 39378 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 268658 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| d | 44395 | |
| a | 39386 | |
| y | 37307 | |
| e | 19798 | |
| s | 19596 | |
| u | 17198 | 6.4% |
| r | 16372 | 6.1% |
| n | 15313 | 5.7% |
| T | 12508 | 4.7% |
| F | 7407 | 2.8% |
| Other values (7) | 39378 |
title_length
Real number (ℝ≥0)
| Distinct | 94 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.66486182 |
| Minimum | 7 |
|---|---|
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 21 |
| Q1 | 34 |
| median | 48 |
| Q3 | 62 |
| 95-th percentile | 87 |
| Maximum | 100 |
| Range | 93 |
| Interquartile range (IQR) | 28 |
Descriptive statistics
| Standard deviation | 19.7604799 |
|---|---|
| Coefficient of variation (CV) | 0.3978764699 |
| Kurtosis | -0.4201283405 |
| Mean | 49.66486182 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | 0.4497771263 |
| Sum | 1852847 |
| Variance | 390.4765659 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 34 | 887 | 2.4% |
| 36 | 855 | 2.3% |
| 33 | 852 | 2.3% |
| 49 | 849 | 2.3% |
| 30 | 848 | 2.3% |
| 51 | 810 | 2.2% |
| 42 | 793 | 2.1% |
| 35 | 785 | 2.1% |
| 48 | 768 | 2.1% |
| 43 | 750 | 2.0% |
| Other values (84) | 29110 |
| Value | Count | Frequency (%) |
| 7 | 4 | < 0.1% |
| 8 | 6 | < 0.1% |
| 9 | 19 | 0.1% |
| 10 | 40 | 0.1% |
| 11 | 42 | 0.1% |
| 12 | 26 | 0.1% |
| 13 | 168 | |
| 14 | 139 | |
| 15 | 143 | |
| 16 | 82 |
| Value | Count | Frequency (%) |
| 100 | 53 | 0.1% |
| 99 | 132 | |
| 98 | 136 | |
| 97 | 167 | |
| 96 | 86 | |
| 95 | 201 | |
| 94 | 170 | |
| 93 | 146 | |
| 92 | 91 | |
| 91 | 123 |
description_length
Real number (ℝ≥0)
| Distinct | 1797 |
|---|---|
| Distinct (%) | 4.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 905.0065135 |
| Minimum | 1 |
|---|---|
| Maximum | 5260 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 92 |
| Q1 | 356 |
| median | 655 |
| Q3 | 1229 |
| 95-th percentile | 2535 |
| Maximum | 5260 |
| Range | 5259 |
| Interquartile range (IQR) | 873 |
Descriptive statistics
| Standard deviation | 809.0663016 |
|---|---|
| Coefficient of variation (CV) | 0.8939894791 |
| Kurtosis | 4.292087135 |
| Mean | 905.0065135 |
| Median Absolute Deviation (MAD) | 379 |
| Skewness | 1.849870025 |
| Sum | 33763078 |
| Variance | 654588.2804 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 510 | 141 | 0.4% |
| 183 | 135 | 0.4% |
| 335 | 117 | 0.3% |
| 92 | 117 | 0.3% |
| 571 | 115 | 0.3% |
| 298 | 114 | 0.3% |
| 334 | 112 | 0.3% |
| 378 | 108 | 0.3% |
| 357 | 107 | 0.3% |
| 548 | 105 | 0.3% |
| Other values (1787) | 36136 |
| Value | Count | Frequency (%) |
| 1 | 13 | < 0.1% |
| 7 | 4 | < 0.1% |
| 9 | 12 | < 0.1% |
| 11 | 19 | 0.1% |
| 12 | 4 | < 0.1% |
| 13 | 15 | < 0.1% |
| 14 | 2 | < 0.1% |
| 15 | 63 | |
| 16 | 26 | |
| 17 | 19 | 0.1% |
| Value | Count | Frequency (%) |
| 5260 | 8 | |
| 5194 | 17 | |
| 5089 | 6 | < 0.1% |
| 5048 | 3 | < 0.1% |
| 5047 | 3 | < 0.1% |
| 5021 | 1 | < 0.1% |
| 4999 | 3 | < 0.1% |
| 4998 | 13 | |
| 4977 | 1 | < 0.1% |
| 4965 | 12 |
| Distinct | 30559 |
|---|---|
| Distinct (%) | 81.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 72.37744249 |
| Minimum | 3.382 |
|---|---|
| Maximum | 30172.871 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 3.382 |
|---|---|
| 5-th percentile | 11.2926 |
| Q1 | 22.2495 |
| median | 36.072 |
| Q3 | 67.312 |
| 95-th percentile | 174.4099 |
| Maximum | 30172.871 |
| Range | 30169.489 |
| Interquartile range (IQR) | 45.0625 |
Descriptive statistics
| Standard deviation | 437.8933185 |
|---|---|
| Coefficient of variation (CV) | 6.050135283 |
| Kurtosis | 3470.021292 |
| Mean | 72.37744249 |
| Median Absolute Deviation (MAD) | 17.688 |
| Skewness | 54.7018187 |
| Sum | 2700185.247 |
| Variance | 191750.5584 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 62.605 | 6 | < 0.1% |
| 32.999 | 6 | < 0.1% |
| 27.972 | 5 | < 0.1% |
| 34.256 | 5 | < 0.1% |
| 22.372 | 5 | < 0.1% |
| 27.333 | 5 | < 0.1% |
| 28.921 | 5 | < 0.1% |
| 13.787 | 5 | < 0.1% |
| 16.634 | 5 | < 0.1% |
| 15.888 | 5 | < 0.1% |
| Other values (30549) | 37255 |
| Value | Count | Frequency (%) |
| 3.382 | 1 | |
| 3.561 | 1 | |
| 3.763 | 1 | |
| 3.942 | 1 | |
| 4.013 | 1 | |
| 4.092 | 2 | |
| 4.271 | 1 | |
| 4.301 | 1 | |
| 4.339 | 1 | |
| 4.386 | 1 |
| Value | Count | Frequency (%) |
| 30172.871 | 1 | |
| 30156.167 | 1 | |
| 29599.495 | 1 | |
| 28917.022 | 1 | |
| 27494.918 | 1 | |
| 27447.741 | 1 | |
| 26532.878 | 1 | |
| 9825.599 | 1 | |
| 9241.006 | 1 | |
| 7919.303 | 1 |
| Distinct | 37036 |
|---|---|
| Distinct (%) | 99.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1740.412388 |
| Minimum | 4.639 |
|---|---|
| Maximum | 79800.948 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 4.639 |
|---|---|
| 5-th percentile | 221.4069 |
| Q1 | 684.4115 |
| median | 1285.613 |
| Q3 | 2140.7735 |
| 95-th percentile | 4219.4086 |
| Maximum | 79800.948 |
| Range | 79796.309 |
| Interquartile range (IQR) | 1456.362 |
Descriptive statistics
| Standard deviation | 2447.818382 |
|---|---|
| Coefficient of variation (CV) | 1.406458836 |
| Kurtosis | 239.2430401 |
| Mean | 1740.412388 |
| Median Absolute Deviation (MAD) | 674.923 |
| Skewness | 11.9994778 |
| Sum | 64929564.97 |
| Variance | 5991814.832 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 664.309 | 5 | < 0.1% |
| 5.222 | 3 | < 0.1% |
| 988.105 | 3 | < 0.1% |
| 1163.477 | 3 | < 0.1% |
| 664.33 | 3 | < 0.1% |
| 434.807 | 2 | < 0.1% |
| 3742.346 | 2 | < 0.1% |
| 1768.876 | 2 | < 0.1% |
| 2137.911 | 2 | < 0.1% |
| 1999.363 | 2 | < 0.1% |
| Other values (37026) | 37280 |
| Value | Count | Frequency (%) |
| 4.639 | 1 | < 0.1% |
| 5.057 | 1 | < 0.1% |
| 5.092 | 1 | < 0.1% |
| 5.116 | 1 | < 0.1% |
| 5.215 | 1 | < 0.1% |
| 5.221 | 1 | < 0.1% |
| 5.222 | 3 | |
| 5.223 | 1 | < 0.1% |
| 5.225 | 1 | < 0.1% |
| 5.226 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 79800.948 | 1 | |
| 75753.432 | 1 | |
| 63027.477 | 1 | |
| 61286.044 | 1 | |
| 61216.097 | 1 | |
| 60796.14 | 1 | |
| 60588.087 | 1 | |
| 60365.096 | 1 | |
| 60169.494 | 1 | |
| 60162.039 | 1 |
| Distinct | 36607 |
|---|---|
| Distinct (%) | 98.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 949.772993 |
| Minimum | 13.681 |
|---|---|
| Maximum | 1421762 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 13.681 |
|---|---|
| 5-th percentile | 70.3703 |
| Q1 | 198.9505 |
| median | 381.577 |
| Q3 | 749.4485 |
| 95-th percentile | 2289.5125 |
| Maximum | 1421762 |
| Range | 1421748.319 |
| Interquartile range (IQR) | 550.498 |
Descriptive statistics
| Standard deviation | 12996.15484 |
|---|---|
| Coefficient of variation (CV) | 13.68343271 |
| Kurtosis | 6376.633545 |
| Mean | 949.772993 |
| Median Absolute Deviation (MAD) | 228.008 |
| Skewness | 74.25807043 |
| Sum | 35433181.05 |
| Variance | 168900040.5 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 525.32 | 3 | < 0.1% |
| 311.101 | 3 | < 0.1% |
| 189.379 | 3 | < 0.1% |
| 309.203 | 3 | < 0.1% |
| 139.492 | 3 | < 0.1% |
| 79.179 | 3 | < 0.1% |
| 57.449 | 3 | < 0.1% |
| 315.191 | 3 | < 0.1% |
| 130.596 | 3 | < 0.1% |
| 148.699 | 3 | < 0.1% |
| Other values (36597) | 37277 |
| Value | Count | Frequency (%) |
| 13.681 | 1 | |
| 13.74 | 1 | |
| 16.614 | 1 | |
| 17.021 | 1 | |
| 17.324 | 1 | |
| 17.358 | 1 | |
| 17.447 | 1 | |
| 17.672 | 1 | |
| 17.819 | 1 | |
| 18.118 | 1 |
| Value | Count | Frequency (%) |
| 1421762 | 1 | |
| 961489.667 | 1 | |
| 934082 | 1 | |
| 930351 | 1 | |
| 618991 | 1 | |
| 522111.6 | 1 | |
| 464294.833 | 1 | |
| 459688.5 | 1 | |
| 388770.333 | 1 | |
| 387752.143 | 1 |
| Distinct | 30257 |
|---|---|
| Distinct (%) | 81.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.45089147 |
| Minimum | 0.031 |
|---|---|
| Maximum | 940.669 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 291.6 KiB |
Quantile statistics
| Minimum | 0.031 |
|---|---|
| 5-th percentile | 3.8492 |
| Q1 | 16.063 |
| median | 31.769 |
| Q3 | 60.696 |
| 95-th percentile | 155.9518 |
| Maximum | 940.669 |
| Range | 940.638 |
| Interquartile range (IQR) | 44.633 |
Descriptive statistics
| Standard deviation | 58.40618173 |
|---|---|
| Coefficient of variation (CV) | 1.181094617 |
| Kurtosis | 24.08165957 |
| Mean | 49.45089147 |
| Median Absolute Deviation (MAD) | 19.509 |
| Skewness | 3.792201279 |
| Sum | 1844864.408 |
| Variance | 3411.282064 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5.5 | 15 | < 0.1% |
| 0.438 | 15 | < 0.1% |
| 0.041 | 10 | < 0.1% |
| 6.226 | 9 | < 0.1% |
| 3.135 | 9 | < 0.1% |
| 26 | 9 | < 0.1% |
| 54 | 8 | < 0.1% |
| 4.123 | 8 | < 0.1% |
| 7.083 | 7 | < 0.1% |
| 6.42 | 7 | < 0.1% |
| Other values (30247) | 37210 |
| Value | Count | Frequency (%) |
| 0.031 | 1 | < 0.1% |
| 0.032 | 1 | < 0.1% |
| 0.033 | 1 | < 0.1% |
| 0.034 | 2 | < 0.1% |
| 0.035 | 3 | < 0.1% |
| 0.036 | 3 | < 0.1% |
| 0.039 | 1 | < 0.1% |
| 0.041 | 10 | |
| 0.042 | 7 | |
| 0.043 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 940.669 | 1 | |
| 935.958 | 1 | |
| 925.544 | 1 | |
| 840.751 | 1 | |
| 760 | 1 | |
| 692.302 | 1 | |
| 691.167 | 1 | |
| 676.797 | 1 | |
| 651.274 | 1 | |
| 642.8 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | category_id | views | likes | dislikes | comment_count | publish_time_1 | publish_weekday | title_length | description_length | Ratio_View_likes | Ratio_View_dislikes | Ratio_views_comment_count | Ratio_likes_dislikes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 26 | 7224515 | 55681 | 10247 | 9479 | 07:38:29 | Friday | 45 | 821 | 129.748 | 705.037 | 762.160 | 5.434 |
| 1 | 1 | 24 | 1053632 | 25561 | 2294 | 2757 | 06:24:44 | Sunday | 41 | 417 | 41.220 | 459.299 | 382.166 | 11.143 |
| 2 | 2 | 10 | 17158579 | 787420 | 43420 | 125882 | 17:00:03 | Friday | 42 | 594 | 21.791 | 395.177 | 136.307 | 18.135 |
| 3 | 3 | 17 | 27833 | 193 | 12 | 37 | 02:30:38 | Monday | 76 | 396 | 144.212 | 2319.417 | 752.243 | 16.083 |
| 4 | 4 | 25 | 9815 | 30 | 2 | 30 | 01:45:13 | Monday | 55 | 151 | 327.167 | 4907.500 | 327.167 | 15.000 |
| 5 | 5 | 24 | 1182775 | 52708 | 1431 | 2333 | 17:00:00 | Saturday | 28 | 819 | 22.440 | 826.537 | 506.976 | 36.833 |
| 6 | 6 | 10 | 33523622 | 1634124 | 21082 | 85067 | 11:04:14 | Thursday | 43 | 1250 | 20.515 | 1590.154 | 394.085 | 77.513 |
| 7 | 7 | 22 | 1164201 | 57309 | 749 | 624 | 19:19:43 | Friday | 29 | 763 | 20.314 | 1554.340 | 1865.707 | 76.514 |
| 8 | 8 | 10 | 154494 | 2163 | 147 | 211 | 08:00:01 | Friday | 48 | 434 | 71.426 | 1050.980 | 732.199 | 14.714 |
| 9 | 9 | 10 | 9548677 | 190084 | 15015 | 11473 | 15:00:00 | Friday | 60 | 690 | 50.234 | 635.943 | 832.274 | 12.660 |
Last rows
| df_index | category_id | views | likes | dislikes | comment_count | publish_time_1 | publish_weekday | title_length | description_length | Ratio_View_likes | Ratio_View_dislikes | Ratio_views_comment_count | Ratio_likes_dislikes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37297 | 38906 | 10 | 3839019 | 79766 | 1662 | 5639 | 17:00:01 | Thursday | 22 | 1533 | 48.129 | 2309.879 | 680.798 | 47.994 |
| 37298 | 38907 | 10 | 7608552 | 93096 | 6025 | 4222 | 04:00:00 | Thursday | 52 | 697 | 81.728 | 1262.830 | 1802.120 | 15.452 |
| 37299 | 38908 | 24 | 2665975 | 26126 | 599 | 3377 | 16:00:03 | Wednesday | 59 | 343 | 102.043 | 4450.710 | 789.451 | 43.616 |
| 37300 | 38909 | 10 | 6078793 | 75335 | 2106 | 1269 | 16:27:39 | Wednesday | 51 | 465 | 80.690 | 2886.416 | 4790.223 | 35.772 |
| 37301 | 38910 | 10 | 1939400 | 169578 | 1202 | 5889 | 16:00:09 | Thursday | 42 | 1341 | 11.437 | 1613.478 | 329.326 | 141.080 |
| 37302 | 38911 | 10 | 25066952 | 268088 | 12783 | 9933 | 07:00:01 | Wednesday | 61 | 890 | 93.503 | 1960.960 | 2523.603 | 20.972 |
| 37303 | 38912 | 10 | 1492219 | 61998 | 13781 | 24330 | 17:09:16 | Friday | 51 | 475 | 24.069 | 108.281 | 61.332 | 4.499 |
| 37304 | 38913 | 10 | 29641412 | 394830 | 8892 | 19988 | 11:05:08 | Tuesday | 34 | 321 | 75.074 | 3333.492 | 1482.960 | 44.403 |
| 37305 | 38914 | 24 | 14317515 | 151870 | 45875 | 26766 | 20:32:32 | Tuesday | 75 | 195 | 94.275 | 312.098 | 534.914 | 3.311 |
| 37306 | 38915 | 10 | 607552 | 18271 | 274 | 1423 | 04:06:35 | Friday | 51 | 500 | 33.252 | 2217.343 | 426.952 | 66.682 |